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Compressive Sensing Using Low Density Frames 

Mehmet Akxjakaya, Jinsoo Park and Vahid Tarokh 



Abstract — We consider the compressive sensing of a sparse or 
compressible signal x G R M . We explicitly construct a class of 
measurement matrices, referred to as the low density frames, 
and develop decoding algorithms that produce an accurate 
estimate x even in the presence of additive noise. Low density 
frames are sparse matrices and have small storage require- 
ments. Our decoding algorithms for these frames have O(M) 
complexity. Simulation results are provided, demonstrating that 
our approach significantly outperforms state-of-the-art recovery 
algorithms for numerous cases of interest. In particular, for 
Gaussian sparse signals and Gaussian noise, we are within 2 
dB range of the theoretical lower bound in most cases. 

Index Terms — Low density frames, compressive sensing, sum 
product algorithm, expectation maximization, Gaussian scale 
mixtures 

I. Introduction 

LET x = (xi,...,x M ) G R M with ||x|| = \{Xi : Xi ^ 
0}| = L. x is said to be sparse if L « M. Consider 
the equation 

y = Ax + n, (1) 

where A is an JV x M measurement matrix and n £ M. N is a 
noise vector. When N < M, y is called the compressively 
sensed version of x with measurement matrix A. In this 
paper, we are interested in coming up with a good estimate 
x for a sparse vector x from the observed vector y and the 
measurement matrix A. 

We refer to the case n = as noiseless compressive sensing. 
This is the only case when x can be perfectly recovered. In 
particular, it can be shown [13], [30] that if A has the property 
that every of its N columns are linearly independent, then a 
decoder can recover x uniquely from N = 2L samples by 
solving the £ a minimization problem 

min||x||o s. t. y = Ax. (2) 

However, solving this £q minimization problem for general 
A is NP-hard [42]. An alternative solution method proposed 
in the literature is the l\ regularization approach, where 

min||x||i s. t. y = Ax, (3) 

is solved instead. Criteria has been established in the literature 
as to when the solution of |3]l coincides with that of |2]) in the 
noiseless case for various classes of measurement matrices 
[13], [16]. In an important contribution, for A belonging to 
the classes of Gaussian and partial Fourier ensembles, Candes 
and Tao showed [13] that this recovery problem can be solved 
for L = O(M) with N = O(L) as long as the observations 
are not contaminated with (additive) noise. 
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It can be shown that there is a relationship between the 
solution to problem ([T]l and minimum Hamming distance 
problem in coding theory [1], [2], [35]. This approach was 
further exploited in [50]. Using this connection, we con- 
structed ensembles of measurement matrices3 and associated 
decoding algorithms that solves the £ minimization problem 
with complexity 0(MN) for L = 0{M) with N = O(L) in 
the noiseless case [1], [2]. 

For problem ([TJ with non-zero n, referred to as noisy 
compressive sensing, the t\ regularization approach of ([3]) can 
also be applied. For a measurement matrix A that satisfies 
a property called the restricted isometry principle (RIP), the 
quadratic program 

min||x||i s. t. ||Ax — y 1 1 2 < e, 

can be solved for a parameter e related to ||n||2, and an 
estimate xqp can be obtained such that ||xqp — x||g < C\ e, 
where C\ is a constant that depends on A [11]. If n ~ 
A/"(0, er 2 Ijv), another approach is the Dantzig Selector 

min||x||i s. t. ||A*(Ax - y)^ < 7, 

where 7 is a function of a and M. This gives an estimate 
x DS such that E n ||x DS -x||| < C 2 (log M) £\ mm(x 2 , a 2 ), 
where C2 is a constant that depends on A [12]. Both these 
methods may not be easily implemented in real time with 
the limitations of today's hardware. To improve the running 
time of l\ methods, some authors have investigated using 
sparse matrices for A [6]. Using the expansion properties 
of the graphs represented by such matrices, it was shown 
in [6] that it is possible to obtain an estimate xe such that 
H x e — x lli < C 3 1 |n| I !, where C 3 is a constant that depends 
on A. 

Another strand of work studies recovery algorithms based 
on the matching pursuit algorithm [44]. Recently, variants 
of this algorithm, e.g. Subspace Pursuit [17] and CoSaMP 
[30], have been proposed. Both algorithms provably work for 
measurement matrices satisfying RIP, and guarantee perfect re- 
construction in the noiseless setting for N — 0(L\og(M / L)) 
as the l\ recovery methods do. For the noisy problem, they 
also offer similar guarantees to £\ methods. These algorithms 
have complexity O(ClogR), where £ is the complexity of 
matrix-vector multiplication (O(MN) for Gaussian matrices, 
0(N log N) for partial Fourier ensembles) and R is a preci- 
sion parameter bounded above by 1 1 x 1 1 2 (which is O(N) for a 
fixed signal-to-noise ratio). In [7], Sparse Matching Pursuit 
(SMP) was proposed for sparse A and this algorithm has 
0(Mlog(M/L)) complexity . 

'We use the terms "frame" and "measurement matrix" interchangeably 
throughout the rest of the paper. 
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Yet another direction in compressive sensing is the use of 
the Bayesian approach. In [23], the idea of relevance vector 
machine (RVM) [40] has been applied to compressive sensing. 
Although simulation results indicate that the algorithm has 
good performance, it has complexity 0(MN 2 ). 

In this paper, we study the construction of measurement 
matrices that can be stored and manipulated efficiently in 
high dimensions, and fast decoding algorithms that generate 
estimates with small distortion. The ensemble of measure- 
ment matrices are a generalization of LDPC codes and we 
refer to them as low density frames (LDFs). For our decoding 
algorithms, we combine various ideas from coding theory, 
statistical learning theory and theory of estimation. Simulation 
results are provided indicating an excellent distortion perfor- 
mance at high levels of sparsity and for high levels of noise. 

The outline of this paper is given next. In Section [ITJ we 
introduce low density frames and study their basic properties. 
In Section [Hi] we introduce various concepts used in algo- 
rithms and describe the decoding algorithms. In Section|lV] we 
provide extensive simulation results for a number of different 
testing criteria. Finally in Section [V] we make our conclusions 
and provide directions for future research. 




Fig. 1 . Factor graph representing the function in Equation [6] 

several variables that can be factored as 

/(w)=n^( w *)- < s > 

In this factorization each factor f s is only a factor of w s , the 
subset of variable nodes w. 

A factor graph depicting |5]l consists of variable nodes rep- 
resented by circles, factor nodes represented by bold squares, 
and undirected edges connecting each factor node to all the 
variable nodes involved in that factor. 

As an example, the factor graph representing 

/(w) = fa(w 1 ,W2,W 3 )f b (w2,W 3 ,W 4 )f c (w 1 ,W4) (6) 

is depicted in Figure [T] 



II. Low Density Frames 

Let T = {4>i, <p2, ■ ■ ■ ,4>m} be a frame consisting of 
M > N non-zero vectors which span M. N , Let (pi = 
(<j>ii,--- ,4>N,i) f° r i — 1)2, ••• , M. This frame could be 
represented in matrix form as an N x M matrix 

/ 01,1 01,2 ••• 01, M \ 
02,1 02.2 ' ' • 02, M 

: ■. ■. : ' (4) 

\ 0JV,1 0W,2 ' ' ' 0JV,Af / 

A low density frame (LDF) T is defined by a matrix F 
where the vast majority of elements of each column and each 
row of F are zeroes. Formally, we define a (d v ,d c )-regular 
LDF as a matrix F that has d c non-zero elements in each row 
and d v non-zero elements in each column. Clearly Md v = 
Nd c . We also note that the redundancy of the frame is r = 
M/N — d c /d v . We will restrict ourselves to binary regular 
LDFs, where the non-zero elements of F are ones. 

The density p of a frame F is the ratio of the number of 
non-zero entries of F to the dimension of F. In this paper, 
we consider regular LDFs for which p = (Md v )/(MN) = 
d v /N « 1. 

As with LDPC codes, it is natural to represent LDFs 
using bipartite graphs. Furthermore, there is a well-established 
literature on inference in graphical models. Some of these 
methods can be used as a basis for recovery algorithms in 
the context of compressive sensing. To this end, we next 
summarize two important ideas from graphical models, namely 
factor graphs and the sum-product algorithm, and show how 
LDFs can be viewed as factor graphs. 

A. Factor Graphs 

Factor graphs are used to represent factorizations of func- 
tions of several variables [9], [24]. Let /(w) be a function of 



B. Sum-Product Algorithm 

The natural inference algorithm for factor graphs is the sum- 
product algorithm [9], [21]. This algorithm is an exact inter- 
ference algorithm for tree-structured graphs (i.e. graphs with 
no cycles), and is usually described over discrete alphabets. 
However, the ideas also apply to continuous random variables 
with the sum being replaced by integration. In doing so, the 
computational cost of implementation increases and this issue 
will be addressed later. 

Suppose the goal is to find the marginal density p(w) for a 
particular variable w. In particular, we have 



P(w) = ^p(w). 



w\u; 

One treats w to be the root node of a tree, and looks at the 
subtrees connected to w via factor nodes. Using this approach, 
the joint distribution can be written as 



P (w)= n ftKw.). 



(7) 



s£ne(w) 



where ne(w) is the neighborhood of w, i.e. the set of factor 
nodes that are connected to w, and W s is the set of variable 
nodes in the subtree connected to the factor node f s in ne(w) 
[9]. F s (w,W s ) represents the product of the factors in the 
subtree associated with f s . Interchanging the summation and 
the products yields 



sGne(w) 

where pj s ^ w (w) is the message sent from factor node f s to 
variable node w. One can show [9] that 

w s \w m£ne(f s )\w 
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where w s are all variable nodes connected to the factor node 
f s , including w, and ne(f s ) are the set of variable nodes 
connected to the factor node f s . One can also show [9] 

lGne(w m )\f s 

Thus there are two types of messages, one type going 
from factor nodes to variable nodes denoted f/,f^> w an d the 
other going from variable nodes to factor nodes denoted 
/iu,_>/. The message propagation starts from the leaves of the 
factor graph. A leaf variable node sends an identity message 
fj, w ^f(w) = 1 to its parent, whereas a leaf factor node / sends 
fif^ w (w) = f(w), a description of / to its parent. These 
expressions for messages give an efficient way of calculating 
the marginal probability distribution. We note that in writing 
out the factorization in it is essential that the graph has 
a tree structure so that the factors in the joint probability 
distribution p(w) can be partitioned into groups, each of which 
is associated with a single factor node in ne(w). 

The algorithm is easily modified to calculate the marginal 
for every variable node in the graph [9]. This modification 
results in only twice as many calculations as calculating a 
single marginal. A more interesting case is when there are 
observed variables in the graph, v. In this case the sum-product 
algorithm could be used to calculate posterior marginals 
p(Wi\v = v). 

C. Graphical Representation of Low Density Frames 

The main connection between the £ minimization problem 
and coding theory involves the description of the underlying 
code [1], V of F, where 



V = {de 



Fd = 0}. 



One can view V as the set of vectors whose product with each 
row of F "checks" to 0. In the works of Tanner, it was noted 
that this relationship between the checks and the codewords 
of a code can be represented by a bipartite graph [39]. This 
bipartite graph consists of two disjoint sets of vertices, V 
and C, where V contains the variable nodes and C contains 
the factor nodes representing checks that codewords need to 
satisfy. Thus we have |V| = M and \C\ = N. Furthermore 
node j in V will be connected to node i in C if and only if the 
(i,j) th element of F is non-zero. Thus the number of edges 
of the graph is equal to the number of non-zero elements in 
the measurement matrix F. For an LDF, this leads to a sparse 
bipartite graph. 

A simple example of this graphical representation is de- 
picted in Figure [2] For representation of LDFs, it is convenient 
to use a factor node, depicted by a EH, called a parity check 
node. This node has the property that the variable nodes 
connected to it should sum to zero. We also note that for the 
purposes of decoding, it is more convenient to use syndromes 
[26] that represent the measurement vector, r. This is done by 
connecting a variable node representing the j xh component of 
r to the f h check node. In this case, the parity check node has 
the property that the variable nodes connected to it sum to rj. 
Thus the graph now represents the set {d G R AI : Fd = r} 
which is a coset of the underlying code of the frame. 
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Fig. 2. A frame F and its graphical representation. 

It is important to note that the graph representing an LDF 
will have cycles. Without the tree structure, the sum-product 
algorithm will only be an approximate inference algorithm. 
However, it has been empirically shown that for sparse graphs 
this approximate algorithm works very well [25], [33], [48]. 

III. Sum Product Algorithm with Expectation 
Maximization 

It is well-known in coding theory literature, that the stan- 
dard decoding algorithm for codes on graphs is the sum- 
product algorithm (SPA) [21], [24], [25]. Given a set of 
observations, this algorithm can be used to approximate the 
posterior marginal distributions. In fact, when there is no 
noise, variants of this algorithm [38] has been successfully 
adapted to compressive sensing [35], [50]. However, when 
we are interested in the practical case of noisy observations, 
these algorithms no longer can be applied in a straightforward 
manner. Some authors have tried to circumvent this difficulty 
by using a two-point Gaussian mixture approach [36], however 
the complexity of this algorithm may grow quickly as the 
number of Gaussian components in the mixtures could grow 
exponentially, unless some approximation is applied. However, 
using these approximations degrades the performance of the 
LDF approach. 

In this paper, we consider Gaussian Scale Mixture (GSM) 
priors with Jeffreys' non-informative hyperprior to obtain an 
algorithm that provides estimates for the noisy compressive 
sensing problem 

r = Fx + n, 

as well as the noiseless problem. Throughout the paper we 
assume that 

n~ JV(0,a 2 I N ). 

However simulation results (not included in this paper) indi- 
cate that the algorithms still work well even for non-Gaussian 
noise. We define the signal-to-noise ratio (SNR) as 

i|Fx|| 2 „. llFxIl 2 



SNR= 101og 10 



E n n 



10 l0 gl 



<r 2 N 



A. Gaussian Scale Mixtures 



The main difficulty in using the sum-product algorithm 
(SPA) in compressive sensing setting is that the variables of 
interest are continuous. Nonetheless SPA can be employed 
naturally when the underlying continous random variables are 
Gaussian [47]. Since any Gaussian pdf Af(x\a,A) can be 
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determined by its mean a and variance A, these constitute the 
messages in this setting. At the variable nodes, the product 
of Gaussian probability density functions (pdf) will result in a 
(scaled) Gaussian pdf, and at the check nodes, the convolution 
of Gaussian pdfs will also result in a Gaussian pdf. i.e. 

J\f(x\ai,Ai) * N{x\a 2 ,A 2 ) oc N(x\a x + a 2 , A 1 + A 2 ), 

and 

N{x\ax,A x ) -N{x\a 2 ,A 2 ) oc Af(x\b, B), 
where oc denotes normalization up to a constant, and 
B 

b = B(A^ai + J 4^" 1 a 2 ). 



Gaussian 



GSM with p. 



(A^ + A 2 



We note that all the underlying operations for SPA preserve 
the Gaussian structure. 

It is well-known that the Gaussian pdf is not "sparsity- 
enhancing". Thus some authors propose the use of the Lapla- 
cian prior 

p( x ) = n^*^) = n ^ exp(— |). (8) 

i i 

Clearly with this prior and for Gaussian noise n 

p(x|y) ocp(y|x)p(x) cx exp (- ||y - Ax||| - A'||x||i), (9) 

and maximization of p(x|y) is equivalent to minimizing 

lly-Axll + A'HxIK, 

which is the objective function for the LASSO algorithm 
[43], [46]. However, a straightforward implementation of this 
algorithm may not be computationally feasible. 

In this paper, we consider the family of Gaussian Scale 
Mixtures (GSM) densities [4], given by 

x = \fj3u, 

where u is a zero-mean Gaussian and \/~P is a positive scalar 
random variable. Hence 



Px\p(x\0)~M(x\O,P), 



and 



Px(x) 



Px\p(x\P)Pp(P)df3. 



This family of densities are symmetric, zero-mean and have 
heavier tails than a Gaussian, and have been successfully used 
in image processing [8], [18], [32], and learning theory [40]. 

In order to completely specify our model, we need to choose 
a pdf for pp(P). In this paper, we use 



pptf) cx Vdet (/(/?)), I(P) = E - 



d 2 logp x \p(x 



dp 2 



where J(/3) is the Fisher information matrix. This is referred 
to as the Jeffreys' prior, which can be shown to be a scalar 
invariant prior suitable for sparse estimation [34]. In our 
model, the prior is given by 

Pi 




GSM with independent p s 
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Fig. 3. Contour plots for a Gaussian distribution (left), a GSM with ft = ft 
distributed according to Jeffreys' prior (middle), a GSM with ft and ft i.i.d. 
with Jeffreys' prior (right). 



which has no parameters to optimize. We note that this is an 
improper density, i.e. it cannot be normalized. In Bayesian 
statistics, these kind of improper priors are used frequently, 
since only the relative weight of the prior determines the a- 
posteriori density [34]. This density also has a singularity at 
the origin. This fact is usually ignored as long as it does 
not create computational problems [32]. As an alternative one 
might set the prior to in a small interval (3 £ [0,/3 m in)- We 
also note that with this choice for pp^Pi), p Xi (%i) oc l/|a;j|, 
which is a very heavy-tailed density. 

To enhance sparsity in each coordinate, it is important to 
have independent (3i for all i [41]. As depicted in the middle 
subplot of Figure [3] compared to a Gaussian distribution, a 
GSM with Pi distributed according to Jeffreys' prior has a 
much sharper peak at the origin even when P\= p 2 . However, 
the subplot on the right demonstrates that if the /3jS are indeed 
independent, the GSM will be highly concentrated not only 
around the origin, but along the coordinate axes as well, which 
is a desired property if we have no further information about 
the locations of the sparse coefficients of x. In our model, we 
will assume that 



M 



M 



p(x,/3) = Hp(xi\pi)l[p{pi) 

i=l i=l 

in order to enhance sparsity in all coordinates. This inde- 
pendence assumption is natural and commonly used in the 
literature [18], [43], [46]. 

B. Expectation Maximization 

The expectation maximization (EM) algorithm is a method 
for finding maximum-likelihood (ML) estimates of parameters 
in a model with observed and hidden variables [29]. Let y be 
the observed data and let z be the hidden data. Let the prob- 
ability density function of (y,z) be /(y,z|0), parametrized 
by the vector 0. The EM algorithm iteratively improves on 
an initial estimate 9^ using a two-step procedure. In the 
expectation step (E-step), we calculate 

Q(0|0«)=E z (log/(y, Z |0)|y,0( fe )) 

given an estimate 8^ from the previous iteration. It is im- 
portant to distinguish the two arguments of the Q function are 
different. The second argument is the conditioning argument 
for the expectation and is fixed during the E-step. In the second 
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edges of the graph 




Fig. 4. The factor graph for a (3,6)-regular LDF with the appropriate 
hyperpriors. 



step, called the maximization step or M-step, a new estimate 

0( fe+1 ) = argmaxQ(0|0 (fe) ) 



is calculated. 

It can be shown that the estimates monotonically increases 
the likelihood with respect to the observed data y [29], 

/(y|0 (fe+1) ) > f(y\e {k) ). 

When 6 is itself a random variable, the M-step maximizes 
(Q(0|0W) + log f(0)), and the EM algorithm can be used to 
find a maximum a-posteriori (MAP) estimate of 6 [28]. 

C. SuPrEM Algorithm I 

The factor graph for decoding purposes is depicted in Figure 
|4] Here, r is the vector of observed variables, x is the vector 
of hidden variables and (3 is the vector of parameters. We 
next propose the Sum Product with Expectation Maximization 
(SuPrEM) Algorithm I. At every iteration t, this algorithm 
uses a combination of the Sum-Product Algorithm (SPA) and 
EM algorithm to generate estimates for the hyperpriors {/3^}, 
as well as a point estimate {x^}. In the EM stage of the 
algorithm, Q(f3\f3^) for the E-step is given by 



Q(P\0®)=E x \togp(x,y,0)\y,0W 

= E x (log (p(y|x)p(x|/3)p()3)) |y, /3 (t) ^ 
= E J logp(y|x) |y, /3 W ) + E x ( logp(x, /3) |y, (3^ 

i=l \ ' 

= ci + E ^ ( l °gp(%i,0i)\y, /3 (t) ) , do) 
i=i ^ ' 

where C\ = E x (logp(y|x)|y, /3'*)) is a term independent of 
p. Let Q(A|/3 ( 'b - E^log^A)^,^). We have 

M 
i=l 



Since in our setting, the underlying variables are Gaussian, 
the density p(xi\y, P^) produced by the SPA is also Gaus- 
sian, with mean p t and variance v^. One can explicitly write 

out Q(/?i|/3 (t) ) as 



Q(A|/3«)=E Xi ( logK^.ft)^,^ 



Ex 4 log 



C 2 -^logA-^-E^K 2 |y,/3 (t) ) 

3 1 
C 2 - 2 lo g&- yjSLH+Vi), 



(12) 



where C*2 is independent of /3j. 
For the M-step, we find 

/3 (t+1) = argmaxQ(/3|/3 (t) ). 

Clearly Q(/3\(3 W ) can be maximized by maximizing each 
Q(/3j|/3(*)). Hence we have the simple local update rule 



" + = argniaxO(/3 l |/3 ( * ) ) 



(13) 



We summarize SuPrEM I in Algorithm [T] The inputs to 
the algorithm contain a stopping criterion T and a message- 
passing schedule S. The stopping criterion does not really 
affect the behavior of the algorithm, and there are a few 
alternatives for a reasonable criterion, which are discussed in 



Section IV It turns out the message-passing schedule is rather 



important for achieving the maximum performance. To this 
end, we develop a message-passing schedule that attains such 
good performance and we describe this schedule in detail in 
Appendix I. For all our simulations, we use this fixed schedule. 
Simulation results indicate that with this fixed schedule, the 
algorithm is robust in various different scenarios. The overall 
complexity of SuPrEM is O(M) for a fixed number of 
iterations. We also note that in the presence of noise, the 
output of the algorithm will not be exactly sparse and a sparse 
estimate can be constructed using soft-thresholding techniques 
such as those described in [18]. 

D. SuPrEM Algorithm II 

When the ratio L/N is relatively large, SuPrEM I does not 
perform well, in particular for high SNRs, since it does not 
enforce strict sparsity. Thus we propose SuPrEM Algorithm 
II that enforces sparsity at various stages of the algorithm and 
sends messages between the nodes of the underlying graph 
accordingly. To this end, we keep a set of candidate variable 
nodes O that are likely to have non-zero values, and modify 
the messages from the variable nodes that do not belong to 
a specified subset of O denoted by 0\. Similar ideas have 
been used in developing state-of-the-art recovery algorithms 
for compressive sensing, such as Subspace Pursuit [17] and 
CoSaMP [30]. The full description is given in Algorithm [2] 

The main modification to SuPrEM I is the addition of a 
sparsification step. Intuitively j3^ is the reliability of the 
hypothesis x^l =/= 0. Throughout the algorithm we maintain 
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Algorithm 1 SuPrEM Algorithm I 



a list of variable nodes 0\ that correspond to the largest 
L coefficients of xW at iteration t. We also keep a list of 
variable nodes 2 corresponding to the L largest elements of 
f3^\ i.e. those with the largest reliabilities of the hypothesis 
x^ ^ 0. In the sparsification stage, these two sets are merged, 
O = 0i U 02- The addition and deletion of elements from 
O allow refinements to be made with each iteration. We note 
L < \0\ < 2L at any given iteration. Decisions are made on 
the elements of O, and 0\ is updated. Finally for variable 
nodes not in 0\, the mean value of the messages is forced to 



Algorithm 2 SuPrEM Algorithm II 

Inputs: The observed vector r, the measurement matrix F, 
the sparsity level L, a stopping criterion T, the noise level 
a 2 (optional), and a message-passing schedule S. 

1. Initialization: Let /3* 0) = |(F T r) fe | 2 /d 2 and let O x = 0. 
Initial outgoing messages from variable node Xk is (0,(3^). 

2. Check Nodes: Same as in Algorithm I. 

3. Variable Nodes: Same as in Algorithm I. 

4. Sparsification: 

a. After the j3kS have been updated, find the indices of the L 
largest (3kS- Let these indices be 2 . 

b. Merge 1 and 2 , i.e. Let O = 1 U 2 . 

c. For all indices in k <G O make a decision on Xk (as in Step 
5 of Algorithm I). For all indices k ^ O, let Xk = 0. 

d. Identify the indices corresponding to the L largest (in 
absolute value) coefficients of x. Update 0\ to be this set of 
L indices. 

e. The variable vertices k e Oi, send out their messages as 
was decided in Step 3 of Algorithm I. The variable vertices 
k £ 0\, send out their messages with mean and the 
variance that was decided in Step 3 of Algorithm 1. 

5. Decisions: Make decisions only the vertices in O. Once 
these are calculated, keep the L indices with the largest 
\£k\,k £ O. Set all other indices to 0. 

6. Iterations: Repeat (2), (3), (4) and (5) until stopping 
criterion T is reached. 

Output: The estimate is x = (x\,x 2 , ■ ■ ■ , &m) t ■ 



be 0, but the variance (i.e. the uncertainty about the estimate 
itself) is kept. By modifying the messages this way, we not 
only enforce sparsity at the final stage, but also throughout the 
algorithm. 

We note that the noise level a 2 is an optional input to 
the algorithm. Our simulations indicate that the algorithm 
works without this knowledge also. However, if this extra 
statistical information is available, it is easily incorporated into 
the algorithm in a natural way and results in a performance 
increase. 

SuPrEM II has complexity O(M). The only significant 
operation different than those in SuPrEM I is the determination 
of the largest L elements of (3 and x. This could be done 
with 0{M) complexity, as described in [15] (Chapter 9). A 
more straightforward implementation for this stage might use 
sorting of the relevant coefficients, which would result in a 
higher complexity of 0(M log M) for the overall algorithm. 

E. Reweighted Algorithms 

For high L/N ratios, simulation results show that SuPrEM 
I and SuPrEM II still perform well. However more iterations 
are needed to achieve very low distortion levels, which may be 
undesirable. Thus we propose a modification to SuPrEM I and 
SuPrEM II to speed up the convergence that uses estimates 
generated within a few iterations. In compressive sensing, 
employing prior estimates to improve the final solution has 
been used for i\ approximation [14], but this increases the 
running time by a factor of reweighing steps. 



Inputs: The observed vector r, the measurement matrix F, 
the noise level a 2 , a stopping criterion T, and a message- 
passing schedule S. 

1. Initialization: Let (3^ = |(F T r) fe | 2 /d 2 . Initial outgoing 
messages from variable node Xk is (0,(3^). 

2. Check Nodes: For i = 1 , 2, . . . , N 

Let {ii, i 2 , ■ ■ ■ , id c } be the indices of the variable nodes 
connected to the i lh check node t^. Let the message coming 
from variable node x ij to the check node ri at t th iteration 
be for j = 1, . . . , d c . Then the outgoing message 

from check node to variable node is 

The messages are sent according to the schedule S. 
3. Variable Nodes: For k = 1,2, ... ,M 

Let {ki,k 2 , . . . ,kd v } be the indices of the check nodes 
connected to the fc* variable node x k . Let the incoming 
message from the check node r kj to the variable node x k at 
the t lh iteration be (fi^ , v k J) for j = 1, . . . , d v . 

a. EM update: Let 

T/W _ ( ST<1 V 1 1 \ .,(*) _ v {t) ( v-<i„ 

v k - I 2^=1 7*1 + ^ry I ' Mfe — v k I Z^=i 7*1 

\ kj k / \ kj 

Then the EM update is j3 k t] = ^ . 

b. Message updates: The outgoing message from variable 
node Xk to check node at the (t + l) th iteration is given 

by (MrM! +1) ), where 



(t+i) _ / ^rd v i i i 



and 



The messages are sent according to the schedule S. 

4. Iterations .-Repeat (2) and (3) until stopping criterion T is 
reached. 

5. Decisions: For the fc* variable node Xk, let the incoming 
messages be {^\v^) for j = 1, . . . , d v . Let 



and 



d v 1 I 1 

(T) - 



-1 



■'3- 

Output: The estimate is x = (x\, x 2 , . . . , xm) t ■ 
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Next, we motivate for our reweighing approach. In our 
algorithms, the initial choice of {/3J? = |(F T r)fc| 2 /d 2 } is 
based on the intuition that (3k must be proportional to \xk\ 2 . 
By providing a better estimate for the initial the rate 

of convergence may be improved. The algorithm is initiated 
with as above and is run for T ri iterations. At the end 
of this stage, we re-initialize to be 



-(T ri ),2 



|(F r (r-Fx^)))J 2 /<t 



and the algorithm is run for T r2 iterations. This process is 
repeated recursively until convergence or 1Z times. We note 
that Y. k= i T r k = T, where T is the original number of 
fixed iterations. Thus the total number of iterations remains 
unchanged when we use reweighing. 

IV. Simulation Details 
A. Simulation Setup 

In our simulations we used LDFs with parameters (3,6), 
(3,12) and (3,24) for M/N = 2,4,8 and M = 10000. We 
constructed these frames using the progressive edge growth 
algorithm [22], avoiding cycles of length 4 when possible [^] 
Simulations will be presented for SNR= 12, 24, 36 dB, as well 
as the noiseless case. For various choices of L and SNR, we 
ran 1000 Monte-Carlo simulations for each value, where x 
is generated as a signal with L non-zero elements that are 
picked from a Gaussian distribution. The support of x is picked 
uniformly at random. Once x is generated, it is normalized 
such that ||Fx|| 2 = VN. Thus SNR= 101og 10 4y. 

Let Q be the genie decoder that has full information about 
supp(x) = {i : Xi 0}. Let the output of this decoder be 
Xgenie = Q{ r ) obtained by solving the least squares problem 
involving r and the matrix formed by the columns of F 
specified by supp(x). We define the following genie distortion 
measure: 



^-geme \ \ 2 



d s (x,x fl 



This distortion measure is invariant to the scaling of x for a 
fixed SNR. For any other recovery algorithm that outputs an 
estimate x, we let 

j / - \ _ I l x ~ jMjj 

(X, X e j — 2 ; 



where the subscript e denotes the estimation procedure. We 
will be interested in the performance of an estimation proce- 
dure with respect to the genie decoder. To this end, we define 



*-geme \ I 2 



4(x,x e ) 

dg (x. yigenie) 



We will be interested in this quantity averaged over K 
Monte-Carlo simulations, and converted to dB. The closer 
this quantity is to dB means the closer the performance 
of the estimation procedure is to the performance of the genie 
decoder. 

2 We also tested LDFs with 4 cycles and this does not seem to have an 
adverse effect on the average distortion in the presence of noise. 



In other cases, such as the noiseless case, we will be 
interested in the empirical probability of recovery. For K 
Monte-Carlo simulations, this is given by 



1 



K 

Prec = K ^ 
fe=i 



I(x ~ X e ), 



where I(-) is the indicator function for (•) (1 if (•) is true, 
otherwise). We will define the relation x ~ x e to be true only 
if supp(x) = supp(x e ), unless otherwise specified. 

A number of different stopping criterion can be used for T: 
1) x converges, 2) The minimum value of {||r — Fx^l^j-t 
does not change for T d iterations , 3) A fixed number of 
iterations T is reached. In our simulations we use criterion 
two with T d = 30 and T = 500. These values were chosen to 
make sure that the algorithms did not stop too prematurely. The 
message passing schedule S is described in detail in Appendix 
[I] Finally, for the reweighted algorithm we use 10 reweighings 
with T ri = ■■■ = T rw = T/10. 

B. Simulation Results 

Simulation results are presented in Figure [5] for exactly 
sparse signals. 

For comparison to our algorithms, we include results for 
CoSaMP [30] and d based methods [10], [11], [13], [16], 
[19]. For these algorithms we used partial Fourier matrices as 
measurement matrices. The choice of these matrices is based 
on their small storage requirements (in comparison to Gaussian 
matrices), while still satisfying restricted isometry principles. 
For CoSaMP, we used 100 iterations of the algorithm (and 150 
iterations of Richardson's iteration for calculating least squares 
solutions). For l\ based methods, we used the L1MAGIC 
package in the noiseless case. In the noisy case, we used both 
L1MAGIC, and the GPSR package (with Barzilai-Borwein 
Gradient Projection with continuation and debiasing). Since 
these two methods approximately perform the same, we in- 
clude the results for GPSR here. In the implementation of 
GPSR we fine-tune the value of r and observe that r = 
O.OOlHF^Hoo gives the best performance. 

Since the outputs of t\ based methods and SuPrEM I are 
not sparse, we threshold x to its L largest coefficients and 
postulate these are the locations of the sparse coefficients. 
For all methods, we solve the least squares problem involving 
r and the matrix formed by the columns of F specified by 
the final estimate for the locations of the sparse coefficients. 
For partial Fourier matrices we use Richardson's iteration to 
calculate this vector, whereas for LDFs we use the LSQR 
algorithm which also has O(M) complexity [31]. 

C. Discussion of The Results 

The simulation results indicate that the SuPrEM algorithms 
outperform the other state-of-the-art algorithms. In the low 
SNR regime (SNR = 12 dB), SuPrEM algorithms and the 
£i methods have similar performance. In moderate and high 
SNR regimes, we see that SuPrEM algorithms significantly 
outperform the other algorithms both in terms of distortion 
and in terms of the maximum sparsity they can work at. 
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Fig. 5. Performance comparison of recovery algorithms for sparse signals with Gaussian non-zero components. 



Furthermore for different values of N, the maximum sparsity 
scales as L = 0(N/log(M/N)), which is the same scaling 
as those of other methods. As we discussed previously the 
performance of SuPrEM I degrades as sparsity and SNR 
increases. We also observe that the reweighted SuPrEM II 
algorithm outperforms the regular SuPrEM II algorithm, even 
though the maximum number of iterations are the same. 

Finally, compared to the other methods for the noiseless 
problem, the SuPrEM algorithms can recover signals that 
have a higher number of non-zero elements. In this case, the 
reweighted algorithm performs the best, and converges faster. 
We also note that the results presented for CoSaMP and l\ 
based methods for the noiseless case are optimistic, since we 
declare success in recovery if d e (x,x e ) < 1CP 6 . We needed 
to introduce this measure, since these algorithms tend to miss 
a small portion of the support of x containing elements of 
small magnitude. 



We also note that for both partial Fourier matrices and 
LDFs, the quantity d s (x, x se „i e ) is almost the same for a fixed 
L and SNR. This means that 2? e / g (x, x e , x ge „i e ) provides 
an objective performance criterion in terms of relative mean- 
square error with respect to the genie bound, as well as in 
terms of absolute distortion error d e (x, x e ). 

D. Simulation Results for Natural Images 

For the testing of compressible signals, instead of using 
artificially generated signals, we used real-world compress- 
ible signals. In particular, we compressively sensed the db2 
wavelet coefficients of the 256 x 256 (raw) peppers image using 
N = 17000 measurements. Then we used various recovery 
algorithms to recover the wavelet coefficients, and we did the 
inverse wavelet transform to recover the original image. 

For SuPrEM algorithms, we used a rate (3, 12) LDF with 
M — 68000 (the wavelet coefficients vector was padded 
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Fig. 6. Performance comparison of recovery algorithms with a 256 X 256 natural image whose db2 wavelet coefficients are compressively sensed with N 
= 17000 measurements. 



with zeros to match the dimension). We set L = 8000 (the 
maximum sparsity the algorithm converged at) for SuPrEM II. 
We ran the algorithm first with a = 0. We also accomodated 
for noise, and estimated the per measurement noise to be 
a = 0.1^=- and ran the algorithm again 3 We ran our 
algorithms for just 50 iterations. For the reweignted SuPrEM II 
algorithm, we let a = and we reweighed after 5 steps of the 
algorithm for a total of 10 reweighings. For SMP, we used the 
SMP package [7]. We used a matrix generated by this package, 
and L = 8000. For the remaining methods, we used partial 
Fourier matrices whose rows were chosen randomly. For t\ 

3 With this value of a, SuPrEM I also provides a similar performance. 
However since the output in this case is very similar to that of SuPrEM II, 
we do not include it in the figure. 



with equality constraints, we used the L1MAGIC package. For 
LASSO, we used the GPSR package and r = O.OOlHF^Hoo, 
as described previously, and we thresholded the output to 
L = 8000 sparse coefficients and solved the appropriate least 
squares problem to get the final estimate. For CoSaMP and 
Subspace Pursuit, we used 100 iterations of the algorithm (and 
150 iterations for the Richardson's iteration for calculating 
the least square solutions). For these algorithms, we used 
L = 3000 for CoSaMP, and L = 3500 for Subspace Pursuit. 
These are slightly lower than the maximum sparsities they 
converged at (L — 3500 and L = 4000 respectively), but the 
values we used resulted in better visual quality and PSNR 
values. The results are depicted in Figure [6] 

The PSNR values for the methods are as follows: 23.41 
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dB for SuPrEM II, 23.83 dB for SuPrEM II (with non-zero 
a 2 ), 24.79 for SuPrEM II (reweighted), 20.18 dB for CoSaMP, 
19.51 dB for SMP, 21.62 dB for £ lt 23.61 dB for LASSO, 
21.27 dB for Subspace Pursuit. Among the algorithms that 
assume no knowledge of noise, we see that SuPrEM II out- 
performs the other algorithms both in terms of PSNR value and 
in terms of visual quality. The two algorithms that accomodate 
noise, SuPrEM II (in this case SuPrEM I also produces a 
similar output) and LASSO have similar PSNR values. Finally, 
the reweighted SuPrEM II also assumes no knowledge of 
noise, and outperforms all other methods by about 1 dB and 
also in terms of visual quality, without requiring more running 
time. 

E. Further Results 

We studied the effect of the change of degree distributions. 
For a given M/N ratio, we need to keep the ratio of d c /d v 
fixed however the values can be varied. Thus we compared 
the performance of d v = 3 LDFs to d v = 5 LDFs, and 
observed that the latter actually performed sligthly better. 
However, having a higher d v means more operations are 
required. We also observed that the number of iterations 
required for convergence was slightly higher. Thus we chose 
to use d v = 3 LDFs that allowed faster decoding. We also 
note that increasing d v too much (while keeping M/N fixed) 
results in performance deterioration, since the graph becomes 
less sparse, and we run into shorter cycles which affect the 
performance of SPA. 

We also tested the performance of our constructions and 
algorithms at M = 100000. With L/M and N/M fixed, 
interestingly the performance improves as M — > oo for 
Gaussian sparse signals for a fixed maximum number of 
500 iterations. This is in line with intuitions drawn from 
Shannon Theory [3]. Another interesting observation is that 
the number of iterations remain unchanged in this setting. In 
general, we observed that the number of iterations required for 
convergence is only a function of L/M and does not change 
with M. 

V. Conclusion 

In this paper, we constructed an ensemble of measurement 
matrices with small storage requirements. We denoted the 
members of this ensemble as Low Density Frames (LDF). For 
these frames, we provided sparse reconstruction algorithms 
that have O(M) complexity and that are Bayesian in nature. 
We evaluated the performance of this ensemble of matrices and 
their decoding algorithms, and compared their performance to 
other state-of-the-art recovery algorithms and their associated 
measurement matrices. We observed that in various cases 
of interest, SuPrEM algorithms with LDFs outperformed the 
other algorithms with partial Fourier matrices. In particular, 
for Gaussian sparse signals and Gaussian noise, we are within 
2 dB range of the theoretical lower bound in most cases. 

There are various interesting research problems in this area. 
One is to find a deterministic message-passing schedule that 
performs as well as (or better than) our probabilistic message- 
passing schedule and that is amenable to analysis. Another 



open problem is to analyze the performance of the iterative 
decoding algorithms for the LDFs theoretically, which may 
in turn lead to useful design tools (like Density Evolution 
[33]) that might help with the construction of LDFs with 
irregular degree distributions. Adaptive measurements using 
the soft information available about the estimates, as well as 
online decoding (similar to Raptor Codes [37]) is another open 
research area. Finally, if further information is available about 
the statistical properties of a class of signals (such as block- 
sparse signals or images represented on wavelet trees as in 
[5]), the decoding algorithms may be changed accordingly to 
improve performance. 

Appendix I 

Details On The Message-Passing Schedule 

A message-passing schedule determines the order of mes- 
sages passed between variable and check nodes of a factor 
graph. Traditionally, with LDPC codes, the so-called "flood- 
ing" schedule is used. In this schedule, at each iteration, 
all the variable nodes pass messages to their neighboring 
check nodes. Subsequently, all the check nodes pass messages 
to their neighboring variable nodes. For a cycle-free graph, 
SPA with a flooding schedule correctly computes a-posteriori 
probabilities [9], [49]. An alternative schedule is the "serial" 
schedule, where we go through each variable node serially and 
compute the messages to the neighboring nodes. The order in 
which we go through variable nodes could be lexicographic, 
random or based on reliabilities. 

In this section, we propose the following schedule based 
on the intuition derived from our simulations and results from 
LDPC codes [27], [49]: For the first iteration, all the check 
nodes send messages to variable nodes and vice-versa in 
a flooding schedule. After this iteration, with probability \ 
each check node is "on" or "off". If a check node is off, 
it marks the edges connected to itself as an "inactive", and 
sends back the messages it received to the variable nodes. If 
a check node is on, it marks the edges connected to itself as 
"active" and computes a new message. At the variable nodes, 
when calculating the new beta, we only use the information 
coming from active edges. That is for k = 1,2, ... ,M, let 
{k\, ki, ... , fc^} be the indices of the check nodes connected 
to the fc th variable node Xk- Let the incoming message from 
the check node to the variable node Xk at the t lh iteration 
be (fj,^,v^) for j = 1, . . . , d v . We will have 



E 



(k.kj) is an active edge V k 



(*) 



+ 



(t-1) 



l J k 



(t) = x w 



ft 

2^ At) 

, (k,kj) is an active edge kj , 



and 



Thus when there is no active edge, we do not perform a 
(3 update. For the special case when there is only one active 
edge (k,kj), we let jJ^ = jj,k y This is because the intrinsic 
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information is more valuable, and the estimate on (3^ ^ tends [22 
to be not as reliable. When we calculate the point estimate, 
we use all the information at the node, including the reliable 
and unreliable edges, i.e. 



E 



i 



E 



It is noteworthy that the flooding schedule and serial sched- 
ules tend to converge to local minima and they do not perform 
as well as this schedule we proposed. 
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